Genetic Programming Theory and Practice XVII by Unknown
Author:Unknown
Language: eng
Format: epub
ISBN: 9783030399580
Publisher: Springer International Publishing
10.3.4 Safe Exploration
The problem of safe exploration is how evolution (or individuals capable of life-time learning) can explore new solutions without ever (or only very rarely) taking catastrophic actions, i.e. ones that harm valuable aspects of the environment, including humans or expensive equipment such as robots. Note that safe exploration remains a problem even if objectives are correctly specified: Even if a fitness function correctly identifies all unacceptable negative side-effects, and a properly-trained agent would thus avoid such effects, during learning an agent might still undertake catastrophic actions. For example, the cleaning robot may suffer a fitness penalty for breaking a vase, but it still needs to experience that penalty during training to learn to avoid breaking it. A related problem is that given a robotic controller that behaves safely, there is no guarantee that an arbitrary mutation of it will also be safe. The danger of exploration is a deep philosophical problem, in that the very act of exploration seems inherently to be about stepping into the unknown. However, humans can often successfully explore new possibilities and emerge relatively unscathed (sometimes using mental models to predict whether a new strategy would be catastrophic before trying it, somewhat similarly to model-based RL [63]), suggesting that practical solutions may be possible.
There are two main ways that real-world accidents from safe exploration can emerge in EC. First, take the case of learning a plastic policy (e.g. a policy that learns from experience during its lifetime [58, 59]). For example, a robot might be trained to explore any environment it is embedded within, in search of a particular goal. In effect, such an agent must learn how to explore, and if the deployment plan involves the real world (through embodied evolution, or crossing the reality gap), then there are risks from unsafe exploration. For example, in a new environment, a learned exploratory strategy might lead the robot to damage itself. Second, there is the case where a learned (non-plastic) policy is either trained in the real world (embodied evolution), or is fine-tuned in the real world after being trained in simulation. In this case, exploring the space of policies (through mutations of existing policies) may result in unsafe policies. For example, in some robotics domains solutions are known to be fragile, i.e. that most mutations result in degenerate (possibly damaging) behavior [33, 38]. For concreteness, a robot trained to walk successfully in simulation may lose some performance when transferred across the reality gap, and there is no guarantee that perturbations of the transferred policy (explored in hopes they will improve the walking policy) will not cause the robot to fall and harm itself.
Overall, it may be impossible to solve the issue of safe exploration without involving some form of human oversight. The reason is that learning what is unsafe seemingly requires either: (1) an accurate model of the world that includes robust identification of catastrophes, (2) labelled data of all possible causes of unsafe scenarios in a domain, or (3) active experience in the domain with feedback from an overseer that prevents unsafe actions from being taken.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(157788)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74281)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(66082)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(65832)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(65270)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50860)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40225)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40191)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40057)
What's Done in Darkness by Kayla Perrin(27108)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26484)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26435)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(21018)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(20777)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(20649)
The Fifty Shades Trilogy & Grey by E L James(19605)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19487)
Shot Through the Heart by Mercy Celeste(19349)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17492)